Consolidate time series downsampling docs #2274

marciw · 2025-07-24T21:27:27Z

Status: ready for PM/engineer review

🚧 not ready for tech writer review

This PR restructures and revises the downsampling section:

remove repetitive info and filler
apply a more logical structure
consolidate methods on one page with tabs
edit for clarity, conciseness, and Elastic style (partial)

closes #2239

github-actions · 2025-07-24T21:39:08Z

🔍 Preview links for changed docs

…s-content into mw-tsds-downsampling

marciw · 2025-08-27T19:06:42Z

@yannis-roussos This is ready for your review, whenever you have a chance. Feel free to add other reviewers. 🙏

(I'll remove the temporary review status markers before merging)

yannis-roussos · 2025-09-05T09:20:30Z

manage-data/data-store/data-streams/downsampling-time-series-data-stream.md

-:alt: time series original
-:title: Original metrics series
-:::
+Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for increased storage space.


Suggested change

Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for increased storage space.

Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for decreased storage space.

@yannis-roussos I'm going to revert this change -- it was syntactically/idiomatically correct as it was. You downsample, and the benefit you get "in exchange for downsampling" is increased storage space. It's a trade-off.

I think the "in exchange for" structure is idiomatic and concise, but we could spell it out more if you prefer:

"Downsampling lets you reduce the resolution and precision of older data, in order to free up storage space."
or
"Downsampling lets you reduce the resolution and precision of older data, in order to also reduce the data's storage footprint."

The last option seems good. WDYT?

@marciw OK, I get what you are saying! I agree, the way this is written can confuse people in the same way I got confused. Both proposals look good to me, whichever on you prefer looks good to me!

manage-data/data-store/data-streams/downsampling-time-series-data-stream.md

manage-data/data-store/data-streams/downsampling-concepts.md

yannis-roussos

Thank you @marciw, this revamp looks really great! I have added some small comments and pinged Mary to help with an additional review / pair of eyes - her expertise on downsampling will be invaluable to iron out the last few details and move forward!

manage-data/data-store/data-streams/downsampling-concepts.md

yannis-roussos · 2025-09-05T13:27:13Z

manage-data/data-store/data-streams/downsampling-concepts.md

+4. For all other fields, copies the most recent value to the target index.
+5. Deletes the original index and replaces it with the downsampled index. Within a data stream, only one index can exist for a time period.
+
+The new, downsampled index is created on the data tier of the original index and inherits the original settings, like number of shards and replicas.


@gmarouli is this true for ILM? If my understanding is correct, this is only true for downsample actions on the hot phase (coupled with rollover) or for ones with migrate not enabled.

If I am not wrong, can you help with rephrasing this sentence?

Yes, this is true for ILM.

@yannis-roussos the downsampling action is executed before the migrate action so it happens on the same tier as the original index.

This has caused confusion in the past, but the benefits outweigh the cons:

The "hotter" the tier the better the resources it has, the more performant the downsampling would be.

Migrate will have less data to "migrate".

Thank you Mary, your explanation helps a lot!

Should we add part of it here so that we don't confuse our users?

@yannis-roussos sorry, what exactly do you want to add here?

I am curious to hear what @yannis-roussos has in mind, from me this looks clear enough.

I was thinking something more or less like what you explained to me and it immediately made sense:

Suggested change

The new, downsampled index is created on the data tier of the original index and inherits the original settings, like number of shards and replicas.

Downsampling is performed before the migrate action: the new, downsampled index is created on the data tier of the original index and inherits the original settings, like number of shards and replicas.

We do so because the "hotter" tiers have more resources available, and downsampling is executer at a higher speed. That also means that we also have less data to migrate.

Not perfect, but I hope that it provides some clarity on the way I was thinking about this. We can of course skip this suggestion if you think that it is overly descriptive. But if this caused confusion in the past, maybe it is worth noting somewhere

I am concerned that most users do not know about the migrate action, this why just saying that it runs at the tier of the original index feels more helpful to me.

I have some concerns about the second sentence too. It feels like we are defending an implementation detail to our users, trying to say "it might feel weird but it has its benefits". I feel the docs is not the right place for this, if someone has concerns we can always redirect them to the relevant github issue. If we put it in the docs it feels like we are asking for reactions. For example, we do not explain to the users why the downsampling requires the data to be read-only, we just say it does. I think this is similar.

Makes sense Mary, let's keep it as is then!

manage-data/data-store/data-streams/downsampling-concepts.md

yannis-roussos · 2025-09-05T14:11:06Z

manage-data/data-store/data-streams/downsampling-concepts.md

+## Querying downsampled indices [querying-downsampled-indices]
+
+To query a downsampled index, use the [`_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`_async_search`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-async-search-submit) endpoints. 
+
+* You can query multiple raw data and downsampled indices in a single request, and a single request can include downsampled indices with multiple downsampling intervals (for example, `15m`, `1h`, `1d`).
+* When you run queries in {{kib}} and through Elastic solutions, a standard response is returned, with no indication that some of the queried indices are downsampled.
+* [Date histogram aggregations](elasticsearch://reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md) support `fixed_intervals` only (not calendar-aware intervals).
+* Time-based histogram aggregations use a uniform bucket size, without regard to the downsampling time interval specified in the request.


I propose to keep the same approach as in the quickstart guide and provide real examples instead of a generic meta discussion on something I have not yet seen as a user

You can query multiple raw data and downsampled indices in a single request, and a single request can include downsampled indices with multiple downsampling intervals (for example, 15m, 1h, 1d).

When you run queries in {{kib}} and through Elastic solutions, a standard response is returned, with no indication that some of the queried indices are downsampled.

This more or less tells the user that (1) data streams can have multiple downsampling granularities set for them and that (2) you can just query a data stream as usual.

I think that we should:

introduce this notion of multiple granularities in different phases with an example first somewhere on a section before this one. (Maybe an additional reason to send this section after a page for setting up / running downsampling?)
and then

just tell the user that they should not worry about downsampling. Querying over multiple downsampling granularities just works and, in general, you will not even get an indication that it is there

@gmarouli could you help with a few ideas on what we could have in this section?

I agree that this is going a bit deeper than the rest of the document. Can we convey our message with an image? Or list the things that cannot be done with downsampled data?

I am not opposed to examples, but they need to be carefully chosen to be represent the exact cases we want, for example the timezone offset.

@yannis-roussos @gmarouli Can one of you try to draft the changes you're suggesting or propose some concrete examples? I did create the separate page, so you'll see that the next time you review.

The multiple granularities were addressed. About the querying, the text is also simpler now, so I am not sure, if we want to continue with examples or not. @yannis-roussos what do you think?

I agree, the new dedicated page for querying downsampled data looks much simple now. Maybe just add a note to link to the TS command once it is ready (as an update included in elastic/elasticsearch#134373)?

yannis-roussos · 2025-09-05T14:33:06Z

manage-data/data-store/data-streams/run-downsampling.md

+
+Before you start, make sure your index is a candidate for downsampling:
+
+* The index must be **read-only**. You can roll over a write index and make it read-only.


This is only true for running downsampling manually ("Downsampling with the API" tab). ILM and DLM will take care of this on behalf of the user.

Am I right @gmarouli?

In general I am not sure if we want to get into the manual downsampling that much outside of the dedicated example (and even that one feels like to much to me on the old docs).

I propose to be opinionated and just stick to the happy path / what we propose: Today it is ILM (I guess) and we can change that to DLM once we have tiers supported.

I propose this to move under the "Downsampling with the API" tab

I would also add a little bit more info on what are the upsides / downsides of each method - everything that we have been talking about on the incremental downsampling docs @gmarouli but are not included anywhere in the canonical docs for the current downsampling. Like that we can only have as many actions as phases in ILM or the fact that we have a higher limit on DLM but no ability to move between tiers.

What do you think @gmarouli?

I agree with @yannis-roussos in most comments. But it's not so simple either.

To start with, I think we need to agree about the scope of this page. Is it to guide the user on how to downsample their time series data streams or how to downsample a single index. These two things are not equivalent.

Downsampling a single read-only index means just execute the downsampling API and you get a new index, now you have two indices the original and the downsampled one. But effectively you have downsampled the index, what you want to do with that is up to the user. I would ONLY recommend this for experimentation, if a user wants to see how downsampled data look like, I would suggest that they do that and they play with the downsampled index.

On the other hand, downsampling a data stream with ILM or data stream lifecycle, has many other steps such as waiting for the end time of a backing index to pass, making the index read-only, executing the downsampling, replacing the original backing index with the new one and then deleting the original one. Yes, the downsampling operation is executed on the backing indices but there are other actions that happen on the data stream.

My recommendation/preference is:

The scope of this doc is downsampling time series data streams.

We should mention that downsampling requires the data to be read-only, so when a time range is downsampled it cannot process any more updates.

We only talk about data stream lifecycle and ILM. We can start with a sentence to help them choose which one to look at, for example, "serverless only look at data stream lifecycle, if you need tiers in stack look at ILM. If you do not need tiers consider using data stream lifecycle because it supports more downsampling rounds".

We start with data stream lifecycle and we show only the lifecycle configuration not the whole composable index template, this feels more comparable to the ILM policy.

We finally go through ILM.

What do you think?

I agree with Mary's proposal and I ++ the fact that this is about downsampling time series data streams. Downsampling a stand alone index is an advanced operation that is not part of the path that we want to guide users towards.

I tried to address some of this, but please take another look -- not sure I'm following your comments.

manage-data/data-store/data-streams/run-downsampling.md

Co-authored-by: Yannis Roussos <[email protected]>

marciw · 2025-09-15T02:56:23Z

@yannis-roussos @gmarouli I've addressed most of your comments and asked for clarification on a few others. Please take another look when you have time. 🙏

Also note that this is just one piece of the overall restructuring -- additional PRs coming soon

marciw · 2025-09-15T03:01:32Z

for easy access, here's the link to the preview page -- you can start here and use the "next" link at the bottom of each page https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/2274/manage-data/data-store/data-streams/downsampling-time-series-data-stream

gmarouli

Looks great @marciw , I added some small comments but I am very happy with how it looks, thank you!

gmarouli · 2025-09-15T06:51:00Z

manage-data/data-store/data-streams/downsampling-concepts.md

+
+1. Creates a new document for each group of documents with  matching `_tsid` values (time series dimension fields), grouped into buckets that correspond to timestamps in a specific interval.
+
+    For example, a TSDS index that contains metrics sampled every 10 seconds can be downsampled to an hourly index. All documents within aa given hour interval are summarized and stored as a single document in the downsampled index.


Suggested change

For example, a TSDS index that contains metrics sampled every 10 seconds can be downsampled to an hourly index. All documents within aa given hour interval are summarized and stored as a single document in the downsampled index.

For example, a TSDS index that contains metrics sampled every 10 seconds can be downsampled to an hourly index. All documents within a given hour interval are summarized and stored as a single document in the downsampled index.

gmarouli · 2025-09-15T06:54:17Z

manage-data/data-store/data-streams/downsampling-concepts.md

+4. For all other fields, copies the most recent value to the target index.
+5. Deletes the original index and replaces it with the downsampled index. Within a data stream, only one index can exist for a time period.
+
+The new, downsampled index is created on the data tier of the original index and inherits the original settings, like number of shards and replicas.


I am curious to hear what @yannis-roussos has in mind, from me this looks clear enough.

gmarouli · 2025-09-15T06:56:25Z

manage-data/data-store/data-streams/run-downsampling.md

+To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the lifecycle. 
+
+* Set `fixed_interval` to your preferred level of granularity. The original time series data will be aggregated at this interval.
+* Set `after` to the minimum time to wait after an index rollover, before running downsampling.


@marciw do you think we should mention here also that the index time series end time will be respected?

gmarouli · 2025-09-15T06:57:22Z

manage-data/data-store/data-streams/run-downsampling.md

+serverless: ga
+```
+
+To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the lifecycle. 


Should we mention here that a user can add it to the index template but it will work for new data streams only?

Suggested change

To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the lifecycle.

To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the lifecycle directly to an existing data stream or to an index template for new data streams only.

yannis-roussos

Thank you @marciw, looks great! I agree with Mary that this looks ready for prime time!

Edit and restructure, part 1

35a621a

github-actions bot deployed to docs-preview July 24, 2025 21:28 View deployment

marciw linked an issue Jul 24, 2025 that may be closed by this pull request

Consolidate and restructure downsampling section #2239

Open

8 tasks

Breadcrumbs

a9368d5

github-actions bot deployed to docs-preview July 24, 2025 21:31 View deployment

marciw added 2 commits July 24, 2025 17:33

Fix anchors

f5e7ca5

Save your changes before committing

601494f

github-actions bot had a problem deploying to docs-preview July 24, 2025 21:35 Failure

wip banners

3a0f515

github-actions bot deployed to docs-preview July 24, 2025 21:36 View deployment

Merge branch 'main' into mw-tsds-downsampling

437e0dc

github-actions bot deployed to docs-preview July 28, 2025 18:40 View deployment

Merge branch 'main' into mw-tsds-downsampling

4fd6af5

github-actions bot deployed to docs-preview August 12, 2025 21:34 View deployment

Consolidate further; remove tutorial content

4cb3d4f

github-actions bot deployed to docs-preview August 12, 2025 23:14 View deployment

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into mw-tsds-downsampling

8b6d685

github-actions bot deployed to docs-preview August 26, 2025 21:37 View deployment

More edits

4e15f58

github-actions bot deployed to docs-preview August 27, 2025 18:31 View deployment

Merge branch 'main' into mw-tsds-downsampling

f2a0952

github-actions bot deployed to docs-preview August 27, 2025 18:36 View deployment

marciw added 2 commits August 27, 2025 14:42

more

8f44100

Merge branch 'mw-tsds-downsampling' of https://github.com/elastic/doc…

9ce7726

…s-content into mw-tsds-downsampling

github-actions bot deployed to docs-preview August 27, 2025 18:44 View deployment

Merge branch 'main' into mw-tsds-downsampling

094639f

github-actions bot deployed to docs-preview August 27, 2025 18:58 View deployment

marciw marked this pull request as ready for review August 27, 2025 19:03

marciw requested review from a team as code owners August 27, 2025 19:03

marciw requested a review from yannis-roussos August 27, 2025 19:06

yannis-roussos reviewed Sep 5, 2025

View reviewed changes

yannis-roussos requested a review from gmarouli September 5, 2025 15:24

marciw and others added 2 commits September 8, 2025 21:02

Apply suggestions from review

0dc96f9

Co-authored-by: Yannis Roussos <[email protected]>

Apply suggestions from review

4a09927

Co-authored-by: Yannis Roussos <[email protected]>

github-actions bot deployed to docs-preview September 9, 2025 01:03 View deployment

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into mw-tsds-downsampling

a3807e9

github-actions bot deployed to docs-preview September 14, 2025 13:59 View deployment

Apply suggestions from review

5d478f3

github-actions bot deployed to docs-preview September 15, 2025 01:34 View deployment

Apply suggestions from review

c68529c

github-actions bot deployed to docs-preview September 15, 2025 02:53 View deployment

marciw requested review from gmarouli and yannis-roussos September 15, 2025 02:56

gmarouli approved these changes Sep 15, 2025

View reviewed changes

yannis-roussos approved these changes Sep 15, 2025

View reviewed changes

	Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for increased storage space.
	Metrics tools and solutions collect large amounts of time series data over time. As the data ages, it becomes less relevant to the current state of the system. _Downsampling_ lets you reduce the resolution and precision of older data, in exchange for decreased storage space.

	The new, downsampled index is created on the data tier of the original index and inherits the original settings, like number of shards and replicas.
	Downsampling is performed before the migrate action: the new, downsampled index is created on the data tier of the original index and inherits the original settings, like number of shards and replicas.
	We do so because the "hotter" tiers have more resources available, and downsampling is executer at a higher speed. That also means that we also have less data to migrate.


		Before you start, make sure your index is a candidate for downsampling:

		* The index must be read-only. You can roll over a write index and make it read-only.


		1. Creates a new document for each group of documents with matching `_tsid` values (time series dimension fields), grouped into buckets that correspond to timestamps in a specific interval.

		For example, a TSDS index that contains metrics sampled every 10 seconds can be downsampled to an hourly index. All documents within aa given hour interval are summarized and stored as a single document in the downsampled index.

	To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the lifecycle.
	To downsample a time series via a data stream lifecycle, add a [downsampling](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-data-lifecycle) section to the lifecycle directly to an existing data stream or to an index template for new data streams only.

Consolidate time series downsampling docs #2274

Are you sure you want to change the base?

Consolidate time series downsampling docs #2274

Conversation

marciw commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status: ready for PM/engineer review

🚧 not ready for tech writer review

Uh oh!

github-actions bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

This comment was marked as outdated.

marciw commented Aug 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yannis-roussos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

marciw commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marciw commented Sep 15, 2025

Uh oh!

marciw commented Jul 24, 2025 •

edited

Loading

github-actions bot commented Jul 24, 2025 •

edited

Loading

marciw commented Sep 15, 2025 •

edited

Loading